许多工业和安全应用程序采用套件传感器,用于检测时间行为模式的突然变化。这些突然的变化通常在本地显现,只渲染一个小型传感器信息。由于资源约束,每个传感器的连续监视可能是昂贵的,并且作为强盗最快转换点检测问题的动机,其中顺序地选择感测动作(或传感器),并且仅观察到对应于所选动作的测量。我们在有限的有限参数化概率分布的一般类别的检测延迟上获得了一个信息 - 理论下限。然后,我们提出了一种计算上有效的在线传感方案,这无缝地平衡了对不同传感选项的需求,利用查询信息行动。我们推导出拟议方案的预期延误界限,并表明这些界限在低误报率下以低误报率下限,建立了所提出的方法的最优性。然后,我们对合成和实时数据集进行了许多实验,证明了我们提出的方法的有效性。
translated by 谷歌翻译
The evaluation of abstractive summarization models typically uses test data that is identically distributed as training data. In real-world practice, documents to be summarized may contain input noise caused by text extraction artifacts or data pipeline bugs. The robustness of model performance under distribution shift caused by such noise is relatively under-studied. We present a large empirical study quantifying the sometimes severe loss in performance (up to 12 ROUGE-1 points) from different types of input noise for a range of datasets and model sizes. We then propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any extra training, auxiliary models, or even prior knowledge of the type of noise. Our proposed approach effectively mitigates the loss in performance, recovering a large fraction of the performance drop, sometimes as large as 11 ROUGE-1 points.
translated by 谷歌翻译
Multi-modal image-text models such as CLIP and LiT have demonstrated impressive performance on image classification benchmarks and their zero-shot generalization ability is particularly exciting. While the top-5 zero-shot accuracies of these models are very high, the top-1 accuracies are much lower (over 25% gap in some cases). We investigate the reasons for this performance gap and find that many of the failure cases are caused by ambiguity in the text prompts. First, we develop a simple and efficient zero-shot post-hoc method to identify images whose top-1 prediction is likely to be incorrect, by measuring consistency of the predictions w.r.t. multiple prompts and image transformations. We show that our procedure better predicts mistakes, outperforming the popular max logit baseline on selective prediction tasks. Next, we propose a simple and efficient way to improve accuracy on such uncertain images by making use of the WordNet hierarchy; specifically we augment the original class by incorporating its parent and children from the semantic label hierarchy, and plug the augmentation into text promts. We conduct experiments on both CLIP and LiT models with five different ImageNet-based datasets. For CLIP, our method improves the top-1 accuracy by 17.13% on the uncertain subset and 3.6% on the entire ImageNet validation set. We also show that our method improves across ImageNet shifted datasets and other model architectures such as LiT. Our proposed method is hyperparameter-free, requires no additional model training and can be easily scaled to other large multi-modal architectures.
translated by 谷歌翻译
联合学习(FL),其中多个机构在不共享数据的情况下协作训练机器学习模型正在变得流行。参与机构可能不会平等地做出贡献,有些贡献了更多的数据,一些更好的质量数据或一些更多样化的数据。为了公平地排名不同机构的贡献,沙普利价值(SV)已成为选择方法。精确的SV计算非常昂贵,尤其是在有数百个贡献者的情况下。现有的SV计算技术使用近似值。但是,在医疗保健中,贡献机构的数量可能不是巨大的规模,计算精确的SVS仍然很昂贵,但并非不可能。对于此类设置,我们提出了一种称为Safe的高效SV计算技术(用于使用Enembly的联合学习的Shapley值)。我们从经验上表明,安全计算接近精确SV的值,并且其性能优于当前SV近似值。这在医学成像环境中尤其重要,在医学成像环境中,整个机构之间的广泛异质性猖ramp,并且需要快速准确的数据评估来确定每个参与者在多机构协作学习中的贡献。
translated by 谷歌翻译
人工智能的最新趋势是将验证的模型用于语言和视觉任务,这些模型已经实现了非凡的表现,但也令人困惑。因此,以各种方式探索这些模型的能力对该领域至关重要。在本文中,我们探讨了模型的可靠性,在其中我们将可靠的模型定义为一个不仅可以实现强大的预测性能,而且在许多涉及不确定性(例如选择性预测,开放式设置识别)的决策任务上,在许多决策任务上表现出色,而且表现良好。强大的概括(例如,准确性和适当的评分规则,例如在分布数据集中和分发数据集上的对数可能性)和适应性(例如,主动学习,几乎没有射击不确定性)。我们设计了40个数据集的10种任务类型,以评估视觉和语言域上可靠性的不同方面。为了提高可靠性,我们分别开发了VIT-PLEX和T5-PLEX,分别针对视觉和语言方式扩展了大型模型。 PLEX极大地改善了跨可靠性任务的最先进,并简化了传统协议,因为它可以改善开箱即用的性能,并且不需要设计分数或为每个任务调整模型。我们演示了高达1B参数的模型尺寸的缩放效果,并预处理数据集大小最多4B示例。我们还展示了PLEX在具有挑战性的任务上的功能,包括零射门的开放式识别,主动学习和对话语言理解中的不确定性。
translated by 谷歌翻译
Accurate uncertainty quantification is a major challenge in deep learning, as neural networks can make overconfident errors and assign high confidence predictions to out-of-distribution (OOD) inputs. The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However their practicality in real-time, industrial-scale applications are limited due to the high memory and computational cost. Furthermore, ensembles and BNNs do not necessarily fix all the issues with the underlying member networks. In this work, we study principled approaches to improve uncertainty property of a single network, based on a single, deterministic representation. By formalizing the uncertainty quantification as a minimax learning problem, we first identify distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs with two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations and (2) replacing the last output layer with a Gaussian process layer. On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning. Code is open-sourced at https://github.com/google/uncertainty-baselines
translated by 谷歌翻译
在分类转移下对超自然的监测预测的关注要求对药物发现中关键任务中使用的图形神经网络的广泛可靠性研究。在这里,我们首先介绍了心脏病,一种关于药物心脏病的真实基准,以促进这种努力。我们的探索性研究显示过度自信的错误预测往往远离训练数据。这导致我们开发距离感知GNN:GNN-SNGP。通过对心脏X和三个建立的基准进行评估,我们展示了GNN-SNGP在增加距离意识的有效性,减少过度自信的错误预测,并在不牺牲准确性表现的情况下进行更好的校准预测。我们的消融研究进一步揭示了GNN-SNGP学习的代表改善了其基础架构的远程保存,并且是改进的一个主要因素。
translated by 谷歌翻译
各种眼部疾病和高近视会影响解剖学参考点变性缺口区(FAZ)尺寸。因此,重要的是准确地分段和量化FAZS尺寸。据我们所知,没有可用于分割FAZ的深视网膜层的自动工具或算法。本文介绍了一种具有用户友好的图形用户界面(GUI)的新开放式访问软件,并将结果与地面真相(手动分段)进行比较。
translated by 谷歌翻译
最近,深度学习中的不确定性估计已成为提高安全至关重要应用的可靠性和鲁棒性的关键领域。尽管有许多提出的方法要么关注距离感知模型的不确定性,要么是分布式检测的不确定性,要么是针对分布校准的输入依赖性标签不确定性,但这两种类型的不确定性通常都是必要的。在这项工作中,我们提出了用于共同建模模型和数据不确定性的HETSNGP方法。我们表明,我们提出的模型在这两种类型的不确定性之间提供了有利的组合,因此在包括CIFAR-100C,ImagEnet-C和Imagenet-A在内的一些具有挑战性的分发数据集上优于基线方法。此外,我们提出了HETSNGP Ensemble,这是我们方法的结合版本,该版本还对网络参数的不确定性进行建模,并优于其他集合基线。
translated by 谷歌翻译
最佳决策要求分类器产生与其经验准确性一致的不确定性估计。然而,深度神经网络通常在他们的预测中受到影响或过度自信。因此,已经开发了方法,以改善培训和后HOC期间的预测性不确定性的校准。在这项工作中,我们提出了可分解的损失,以改善基于频流校准误差估计底层的钻孔操作的软(连续)版本的校准。当纳入训练时,这些软校准损耗在多个数据集中实现最先进的单一模型ECE,精度低于1%的数量。例如,我们观察到ECE的82%(相对于HOC后射出ECE 70%),以换取相对于CIFAR-100上的交叉熵基线的准确性0.7%的相对降低。在培训后结合时,基于软合成的校准误差目标会改善温度缩放,一种流行的重新校准方法。总体而言,跨损失和数据集的实验表明,使用校准敏感程序在数据集移位下产生更好的不确定性估计,而不是使用跨熵损失和后HOC重新校准方法的标准做法。
translated by 谷歌翻译